Temperature-Aware Modeling and Banking of IC Lifetime Reliability
نویسندگان
چکیده
Most existing integrated circuit (IC) reliability models assume a uniform, typically worst-case, operating temperature, but temporal and spatial temperature variations affect expected device lifetime. As a result, design decisions and dynamic thermal management (DTM) techniques using worst-case models are pessimistic and result in excessive design margins and unnecessary runtime engagement of cooling mechanisms (and associated performance penalties). By leveraging a reliability model that accounts for temperature gradients (dramatically improving interconnect lifetime prediction accuracy) and modeling expected lifetime as a resource that is consumed over time at a temperatureand voltage-dependent rate, substantial design margin can be reclaimed and runtime penalties avoided while meeting expected lifetime requirements. In this paper, we evaluate the potential benefits and implementations of this technique by tracking the expected lifetime of a system under different workloads while accounting for the impact of dynamic voltage and temperature variations. Simulation results show that our dynamic reliability management (DRM) techniques provide a 40% performance penalty reduction over that incurred by pessimistic DTM in general-purpose computing and a 10% increase in quality of service (QoS) for servers, all while preserving the expected IC lifetime reliability.
منابع مشابه
Interconnect Lifetime Prediction with Temporal and Spatial Temperature Gradients for Reliability-Aware Design and Runtime Management: Modeling and Applications
Thermal effects are becoming a limiting factor in high performance circuit design due to the strong temperature dependence of leakage power, circuit performance, IC package cost and reliability. While many interconnect reliability models assume a constant temperature, this paper analyzes the effects of temporal and spatial thermal gradients on interconnect lifetime in terms of electromigration....
متن کاملThe Importance of Temporal and Spatial Temperature Gradients in IC Reliability Analysis
Existing IC reliability models assume a uniform, typically worst-case, operating temperature, but temporal and spatial temperature variations affect expected device lifetime. This paper presents a model that accounts for temperature gradients, dramatically improving interconnect and gate-oxide lifetime prediction accuracy. By modeling expected lifetime as a resource that is consumed over time a...
متن کاملManaging Reliability of Integrated Circuits: Lifetime Metering and Design for Healing
In nanoscale CMOS devices, manufacturing control, process variations and device reliability have emerged as dominant concerns. Reliability problems manifest over lifetime of ICs in multiple ways, from gradual degradation in performance to increasing leakage current to catastrophic failures. Each of these effects are modulated by workloads executing on ICs and ambient conditions which influence ...
متن کاملAnalysis of Temporal and Spatial Temperature Gradients for IC Reliability
One of the most common causes of IC failure is interconnect electromigration (EM), which exhibits a rate that is exponentially dependent on temperature. As a result, EM rate is one of the major determinants of the maximum tolerable operating temperature for an IC and of resulting cooling costs. Previous EM models have assumed a uniform, typically worst-case, temperature. This paper presents a m...
متن کاملInterconnect Lifetime Prediction for Temperature - Aware Design
Thermal effects are becoming a limiting factor in high-performance circuit design due to the strong temperature-dependence of leakage power, circuit performance, IC package cost and reliability. Temperature-aware design tries to maximize performance under a given thermal envelope through various static and dynamic approaches. While existing interconnect reliability models assume a constant temp...
متن کامل